Internet of Things: Concepts and System Design by Milan Milenkovic

Internet of Things: Concepts and System Design by Milan Milenkovic

Author:Milan Milenkovic
Language: eng
Format: epub
ISBN: 9783030413460
Publisher: Springer International Publishing


Performance Optimization

Getting an ML model to the operational state often requires significant computational resources and time for the model training and testing. While model development costs are usually front-loaded and may take on the order of weeks, it also makes sense to optimize the production version as it may be in use for a long time and benefit from speed improvements. Operating the model efficiently can result in lower cost and/or higher execution speed for delivery of inferences and predictions when running. Another reason for optimization may be to enable the model to run in resource restricted environments, such as the edge nodes and things. An actual decision to spend time on either or both optimizations depends on the specific circumstances.

Hardware and runtime optimization options include tuning the implementation to run on the target hardware platform, such as the specific CPU and OS combinations. Some hardware manufacturers do that for a select set of the popular ML libraries [35]. All other things being equal, picking a supported combination of hardware and a library might be beneficial for performance. One of the important factors in the speed of execution turns out to be the type of arithmetic used by the model implementation, such as the fixed (integer) or the floating point. Some specific versions of hardware perform much better one way than another. In general, the fixed-point variants tend to execute faster.

Another option is to use general-purpose graphics processing units (GP-GPUs). These are the specialized hardware units originally developed as hardware graphics accelerators for gaming applications. They contain a large number of processing units, hundreds or thousands of cores, that operate in parallel with high memory bandwidth. They tend to be suitable for some common ML operations such as the matrix multiplication and can greatly reduce training and inferencing times. Model training or execution on GPUs may require use of special software tools, such as the GPU-specific programming library. Fortunately, they are becoming widely available and integrated with commercial platforms and AI offerings, such as the TensorFlow. The GPU hardware is available for purchase on commercial graphics cards and in the specialized AI cloud configurations. They are also available on the pay-as-you-go basis on the commercial cloud platforms.

More recently, field-programmable gate arrays (FPGAs) are being considered as hardware accelerators. These are chips with a large number of logic gates, memory, and interconnections that can be programmed in the field for any specific application. Some versions also include integrated CPU for general-purpose tasks and to feed the gate array. Functionally, FPGAs enable the creation of the customized hardware configurations optimized for the execution of specific tasks, such as a particular ML model. This has a great potential for optimization in uses that justify the added cost of custom development, usually performed using hardware description languages (HDLs) that require specialized expertise.

Another performance enhancement possibility is the use of hardware specifically designed to accelerate model development and execution stages. As discussed in chapter “IoT Platforms” in conjunction with IoT platforms, designers of one such



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.